Thomson Legal and Regulatory at NTCIR-3: Japanese, Chinese and English Retrieval Experiments
نویسندگان
چکیده
Thomson Legal and Regulatory participated in the CLIR task of the NTCIR-3 workshop. We submitted formal runs for monolingual retrieval in Japanese and Chinese, and for bilingual retrieval from English to Japanese. Our main focus was in Japanese retrieval. We compared word-based and character-based indexing, as well as query formulation using characters and character bigrams. Our results show that wordbased and bigram-based retrieval show similar performance for most query formulation approaches, while they outperform character-based retrieval. For Chinese retrieval, we compared using single characters with using character bigrams. We also introduced a structured query to leverage both. Our results are consistent with previous work, where character bigrams were shown to have better performance than single characters. The structured query approach is promising, but requires more analysis. In our bilingual runs, queries were translated using a machine-readable dictionary. Translated terms were resegmented to match indexing units. Our results, so far, are inconclusive, as we experienced unexpected query formulation issues especially in our word-based approach.
منابع مشابه
Thomson Legal and Regulatory at NTCIR-4: Monolingual and Pivot-Language Retrieval Experiments
Thomson Legal and Regulatory participated in the CLIR task of the NTCIR-4 workshop. We submitted formal runs for monolingual retrieval in Japanese, Chinese and Korean. Our bilingual runs from Chinese and Korean to Japanese rely on English as a pivot language. During our monolingual experiments, we compared building stopword lists using query logs to building stopword lists from collection stati...
متن کاملThomson Legal and Regulatory at NTCIR-5: Japanese and Korean Experiments
Thomson Legal and Regulatory participated in the CLIR task of the NTCIR-5 workshop. We submitted formal runs for monolingual retrieval in Japanese and Korean, as well as for bilingual English-to-Japanese retrieval. We employed enhanced tokenization for our Japanese and Korean runs and applied a novel selective pseudo-relevance feedback scheme for Japanese. Our bilingual search participation was...
متن کاملExperiments on Cross-language and Patent Retrieval at NTCIR-3 Workshop
The Berkeley group participated in the crosslanguage retrieval task and the patent retrieval task at the third NTCIR workshop. This paper describes our experiments on cross-language and patent retrieval. We present an automatic relevance feedback procedure for document ranking formula based on logistic regression, and a procedure for automatically extracting Chinese/Japanese translations of Eng...
متن کاملJustsystem-Clairvoyance CLIR Experiments at NTCIR-4 Workshop
At the NTCIR-4 workshop, Justsystem Corporation and Clairvoyance Corporation collaborated in participating in the Cross-Language Retrieval Task (CLIR). We submitted results to the sub-tracks of SLIR and BLIR. For the SLIR track, we submitted Chinese, English, and Japanese monolingual runs. For the BLIR track, we submitted Japanese-English and Chinese-English runs. The major goal of our particip...
متن کاملCMU in Cross-Language Information Retrieval at NTCIR-3
We participated in the Cross-Language Information Retrieval evaluation at NTCIR-3 for the EnglishChinese and English-Japanese tasks. We examined several approaches to query translation, including the use of a commercial machine translation system, a thesaurus that is automatically extracted from a parallel corpus, and a general-purpose online dictionary. The MT-based approach was most effective...
متن کامل